Search CORE

119 research outputs found

ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations

Author: Gimpel Kevin
Wieting John
Publication venue
Publication date: 01/01/2018
Field of study

We describe PARANMT-50M, a dataset of more than 50 million English-English sentential paraphrase pairs. We generated the pairs automatically by using neural machine translation to translate the non-English side of a large parallel corpus, following Wieting et al. (2017). Our hope is that ParaNMT-50M can be a valuable resource for paraphrase generation and can provide a rich source of semantic knowledge to improve downstream natural language understanding tasks. To show its utility, we use ParaNMT-50M to train paraphrastic sentence embeddings that outperform all supervised systems on every SemEval semantic textual similarity competition, in addition to showing how it can be used for paraphrase generation

arXiv.org e-Print Archive

Crossref

Revisiting Recurrent Networks for Paraphrastic Sentence Embeddings

Author: Gimpel Kevin
Wieting John
Publication venue
Publication date: 01/01/2017
Field of study

We consider the problem of learning general-purpose, paraphrastic sentence embeddings, revisiting the setting of Wieting et al. (2016b). While they found LSTM recurrent networks to underperform word averaging, we present several developments that together produce the opposite conclusion. These include training on sentence pairs rather than phrase pairs, averaging states to represent sequences, and regularizing aggressively. These improve LSTMs in both transfer learning and supervised settings. We also introduce a new recurrent architecture, the Gated Recurrent Averaging Network, that is inspired by averaging and LSTMs while outperforming them both. We analyze our learned models, finding evidence of preferences for particular parts of speech and dependency relations.Comment: Published as a long paper at ACL 201

arXiv.org e-Print Archive

Crossref

Gaussian Error Linear Units (GELUs)

Author: Gimpel Kevin
Hendrycks Dan
Publication venue
Publication date: 08/07/2020
Field of study

We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is

x\Phi(x)

, where

\Phi(x)

the standard Gaussian cumulative distribution function. The GELU nonlinearity weights inputs by their value, rather than gates inputs by their sign as in ReLUs (

x\mathbf{1}_{x>0}

). We perform an empirical evaluation of the GELU nonlinearity against the ReLU and ELU activations and find performance improvements across all considered computer vision, natural language processing, and speech tasks.Comment: Trimmed version of 2016 draft; add exact formul

arXiv.org e-Print Archive

Learning to Embed Words in Context for Syntactic Tasks

Author: Gimpel Kevin
Livescu Karen
Tu Lifu
Publication venue
Publication date: 01/01/2017
Field of study

We present models for embedding words in the context of surrounding words. Such models, which we refer to as token embeddings, represent the characteristics of a word that are specific to a given context, such as word sense, syntactic category, and semantic role. We explore simple, efficient token embedding models based on standard neural network architectures. We learn token embeddings on a large amount of unannotated text and evaluate them as features for part-of-speech taggers and dependency parsers trained on much smaller amounts of annotated data. We find that predictors endowed with token embeddings consistently outperform baseline predictors across a range of context window and training set sizes.Comment: Accepted by ACL 2017 Repl4NLP worksho

arXiv.org e-Print Archive

Crossref

Emergent Predication Structure in Hidden State Vectors of Neural Readers

Author: Gimpel Kevin
McAllester David
Onishi Takeshi
Wang Hai
Publication venue
Publication date: 01/01/2017
Field of study

A significant number of neural architectures for reading comprehension have recently been developed and evaluated on large cloze-style datasets. We present experiments supporting the emergence of "predication structure" in the hidden state vectors of these readers. More specifically, we provide evidence that the hidden state vectors represent atomic formulas

\Phi[c]

where

\Phi

is a semantic property (predicate) and

c

is a constant symbol entity identifier.Comment: Accepted for Repl4NLP: 2nd Workshop on Representation Learning for NL

arXiv.org e-Print Archive

Crossref